This guide has been written for you to follow along, you will create your own version of this file as you go. It will work best if you have two screens - have this document open on one screen, and R Studio with the code and output on the other (main) screen.
Before we go any further, copy the code below into your console and run it. This will identify and install any missing packages that are required for this script to run correctly.
## If a package is installed, it will be loaded. If any
## are not, the missing package(s) will be installed from CRAN
## First specify the packages of interest
packages = c("dplyr", "tidyr", "lubridate", "zoo",
"ggplot2", "DT", "reactable", "plotly")
## Now installs if required
package.check <- lapply(
packages,
FUN = function(x) {
if (!require(x, character.only = TRUE)) {
install.packages(x, dependencies = TRUE)
}
}
)
Start by opening the Rmd file “developing script.Rmd” in R Studio. This is an unformatted version of the markdown file that created this guide. You run a markdown by ‘knit’ing it (look for the icon - it’s a ball of wool with a knitting needle in it). Click the ’knit’ button and look at the document it creates. How does it compare to this file? (The first time you knit an Rmd script you may receive warning messages about packages that are required but aren’t yet installed. We have tried to pre-empt this by running the code above, but if it still happens install those packages and knit again.)
As you work through this guide you will add formatting and content to your script until your knitted document looks like this one!
This guide does not go into the magic that happens between hitting the ‘knit’ button and the output file being produced. There are plenty of resources on the internet if you want to explore this further. For more details about all things markdown visit http://rmarkdown.rstudio.com.
This guide will start by looking at formatting your document, then will add in some data and charts, and improve how they look. The final section adds some interactivity to some of the charts.
Markdown is a way to create and format documents, the types of which include (but aren’t limited to) html, pdf, Word, dashboards, slides etc. Two major benefits of creating these using R Markdown are reproducibility and automation. Code can be embedded, so the creation of properly formatted text plus charts and visualisations all happen with one click. This training guide will focus exclusively on producing html output, but hopefully by the end of it you will have the confidence and skills to investigate other output formats.
R markdown uses the # symbol to create headers. A level one header uses one #, use ## for a level two header etc. Note how, in the script, the background colour of the row changes.
Exercise 1: Turn ‘Introduction’, ‘What is R Markdown?’ and ‘Formatting’ into level 1 headers. Apply subsequent header levels to the text that follows ‘Formatting’.
The asterisk is used to make text bold or italic (or both). For
italics, apply one * either side of the word/phrase, for bold use **
either side of the word/phrase. If you want both italics
and bold, apply ***.
In the same way, apply ^ for superscript and ~~ for strike-through (note
the 2 ~~ symbols).
Exercise 2:
Make this text italic
make this text bold
Really emphasise this by making it bold and italics
Apply a superscript to part of this text
Strike through this text
Just as in Word, you can create bulleted and numbered lists. For bullet points, use *, - or + at the start of each point to be bulleted. Indent and use + for sub-bullets.
Exercise 3a: turn the first list into a bulleted list with sub-bullet, and the second list into a numbered list.
Exercise 3b: Create numbered lists by starting each line with (@). The numbers will dynamically adjust.
You can use numbers to make a numbered list, but you need to add a full-stop after each number.
To create a line break you have to double space after the last sentence then press return.
Even if you press enter in the markdown script, it won’t appear as a new paragraph in the document unless you end the sentence with a double space. This line looks like it should be a new paragraph in the markdown code, but it isn’t once it renders to html.
Make text stand out by using block quotes
Preceed each line with >
This can be useful for headline statements
But remember to put two spaces after each sentence
Exercise 4: turn the previous four lines into block code
Use 3 asterisks to create a line break below this sentence.
The asterisk is turning out to be quite a versatile symbol!
A useful feature for any document that will be viewed electronically is linking to websites or email addresses. To display the complete URL, wrap it in angled brackets <>. A good starting point for learning more about R Markdown is http://rmarkdown.rstudio.com.
Instead, you might want a clickable link that displays some
alternative text. This uses a combination of square and round brackets -
wrap the readable text in square brackets, and follow it with the URL
link in round brackets. R Markdown: the definitive
guide is another really useful resource. This is the same method
used for creating a clickable email link:
[text to display](mailto:email@email.address)
Exercise 5:
Contact: (add some alt text and your email address as a hyperlink)
Go back to the Introduction section and make the link to rmarkdown active.
We are going to be using data from the NHS Scotland Open Data platform. Create some alt text and link it to opendata.nhs.scot
Knit and check your links work.
Your document should be looking more like this one, but it’s still missing the table of contents. We add that in the YAML header. While we are there, we will change the theme. Choose from any of the Bootswatch free themes, or download other templates.
Exercise 6: amend your YAML header as below. Note the indentation after ‘output’, and further indentation after ‘html_document’ - in YAML, this is essential so it knows to link it all back to the html document output. See if you like any of the other free themes from Bootswatch (cerulean, cosmo, cyborg, darkly, flatly, journal, lumen, paper, readable, sandstone, simplex, slate, spacelab, superhero, united or yeti).
output:
html_document:
theme: cosmo
toc: true
toc_float: yes
While you are in the YAML header, change the author and date too.
Now things start to get really interesting! We are going to import some data, do some wrangling and create some outputs to display. We will be using R to do all this, but R Markdown supports other languages too.
There are two ways of inserting code into a document: blocks of code
are wrapped up in a code chunk, but you can also insert objects inline
in text. You create a code chunk using the Insert option at
the top of the pane, the keyboard shortcut Ctrl + Alt + I
or manually, by wrapping it in three backticks at the start and end of
the chunk. You also need to use curly brackets to declare which language
the code is in.
```{r add-chunk-name}
# insert R code here
```
It’s good practice to name your code chunks (each chunk needs a unique name), keep it short but descriptive of what it is doing. Don’t use spaces, dots or underscores in chunks - use “-” if you need to use a separator.
Adding chunk options within the curly brackets can e.g. hide the
code, only show the results, evaluate the code or not (plus many more
options). If you want the same rules to apply to all code chunks you
would include it in the global options. Look at the very first code
chunk in the script - include = FALSE excludes the code
chunks from the output by default.
Use single backticks to run code inline with text. You still need to
include r but this time it doesn’t need to be in curly
brackets. For example, to print today’s date use
`r format(Sys.time(), '%d %B %Y')` inline with the rest of
your text.
Exercise 7:
If you don’t already have the package installed, run
remotes::install_github("Public-Health-Scotland/phsopendata", upgrade = "never")
in the console. This will install the PHS Open Data package, which we
will use to import some data.
Create a code chunk, name it and paste the following code into the chunk.
## libraries
library(dplyr)
library(tidyr)
library(phsopendata) # easily access open data from PHS
library(lubridate) # parse dates correctly
library(zoo)
## one way to get data from the PHS opendata repository (https://www.opendata.nhs.scot/) is by the resource id, found in the metadata section
# importing the last 12 months of monthly A&E data
monthly_ae <- get_resource(res_id = "2a4adc0a-e8e3-4605-9ade-61e13a85b3b9") %>%
mutate(month_end = ym(Month)) %>%
filter(between(month_end, max(month_end)- months(12), max(month_end))) %>%
select(TreatmentLocation, DepartmentType, month_end, NumberOfAttendancesAggregate, NumberMeetingTargetAggregate)
# grouping for use later on
monthly_ae_gp <- monthly_ae %>%
group_by(DepartmentType, month_end) %>%
summarise("Number Of Attendances" = sum(NumberOfAttendancesAggregate),
"Number Meeting Target" = sum(NumberMeetingTargetAggregate)) %>%
pivot_longer(
cols = starts_with("Number"),
names_to = "type",
values_to = "count"
) %>%
ungroup()
When you knit, it doesn’t look like anything has happened. But behind
the scenes, the data has been imported and filtered, and the object
monthly_ae is waiting to be used.
In the next code chunk we will create some simple plots, and also introduce tabsets. These are a useful way to include lots of content without the document becoming excessively long.
To create tabsets add {.tabset} to the header row. Then create two (or more, how many tabs do you want?) headers at the next level down, with a code chunk in each section.
You can add commentary before the next section heading.
And different commentary for this chart.
You aren’t limited to just displaying charts in tabsets - you might want to also include the data behind a chart, in a table.
Exercise 8:
{.tabset} to the Monthly Attendances at
A&E level 2 header.library(ggplot2)
pl_colours <- c("darkred", "steelblue")
plot1 <-ggplot(monthly_ae_gp %>% filter(DepartmentType == "Emergency Department"),
aes(x = month_end, y = count, group = type, colour = type)) +
geom_line() +
expand_limits(y = 0) +
labs(title = "Number of Attendances and 4 hour target at Emergency Departments",
x = "month ending",
y = "number of Attendances") +
scale_colour_manual(values = pl_colours) +
theme_minimal()
plot1
plot2 <-ggplot(monthly_ae_gp %>% filter(DepartmentType == "Minor Injury Unit or Other"),
aes(x = month_end, y = count, group = type, colour = type)) +
geom_line() +
expand_limits(y = 0) +
labs(title = "Number of Attendances and 4 hour target at Minor Injury Units",
x = "month ending",
y = "number of Attendances") +
scale_colour_manual(values = pl_colours) +
theme_minimal()
plot2
.tabset-fade and/or
.tabset-pills to {.tabset} (keep it all inside
the curly brackets).By now the script is getting quite long, and can be tricky to navigate. If you’ve been good at defining sections then you can toggle to show the document outline on the right-hand side of the script pane. Other points of good practice:
load all the libraries you need in the first code chunk. (I haven’t done that in this document, as I wanted you to see which package is used in each example)
similarly, load and wrangle the data early on in the script too, before you start creating content. Then the code chunks within the main text can be kept sparse, such as just calling a plot.
run each chunk in RStudio as you create it to check it works - it’s easier to troubleshoot this way, than following an error message created when trying to knit.
if there’s a lot of processing to be done then create this in
another R script instead, and ‘call’ it into the markdown by using
source.
be very careful using filepaths - the ‘start point’ of filepaths
can be different when running a chunk locally than when knitting. The
easiest way to get round that is to use the package here,
which always starts at the location of the project directory.
It’s all well and good being able to print a chart or table, but it’s
still a static object. Wouldn’t it be great if readers could interact
with those objects? The beauty of rendering the markdown script to html
is that we can! What follows are just a few examples of packages that
can add some interactivity to your reports. There aren’t any exercises,
but follow along with the script to see how to apply them. In your
script the following code chunks are currently set to not run when you
knit - delete the code eval = FALSE from each code chunk
(or change it to TRUE) so that the code does run from now
on.
This default output for displaying tables using DT is shown below,
but you can customise it (add options for exporting the data, change how
many rows are displayed, plus lots more). However, you can see it
doesn’t like the yearmon date format, and displays it as a
numeric. (There is probably a workaround…)
With reactable, you can group data without having to hard code it first, using the built-in aggregate functions or define your own. Clicking on a grouped item will unroll it.
We can apply all sorts of tricks to make our ggplots look fancy, but at the end of the day they are still just static plots. Plotly can bring them to life. Hovering over the lines will display data points, and you can zoom into a specific area, download the plot, plus other options available through the buttons top-right of the plot..
Plotly may not be able to render more advanced ggplots using the
ggplotly function, but you can build plots from scratch in
plotly. The language and terminology is different than ggplot, but
ultimately it gives you a lot more functionality.
This is only a brief introduction to R Markdown, there is so much more to explore! The suggestions below are outwith the scope of this guide, and are only a tiny fraction of what is possible.
Automate the same report for multiple groups (e.g. ICSs, LAs, hospitals, …). These are done by defining the parameters. Chapter 15 in R Markdown: the definitive guide describes this in more detail. This method can also be applied to other types of outputs, such as pdf, Word or Powerpoint documents.
Flexdashboard is an extension package to markdown that allows you to build dashboards in html. These are useful to display data visualisations, and include other components such as data tables, interactive maps, gauges, value boxes etc. It can be made up of multiple pages, or you can control how analysis and its conclusions are unveiled through the use of a storyboard.
Further customise the appearance of the html document by adding CSS styles and html formatting. For instance (look at the source code to see how this effect is created):
you can change the background colour of a block to further highlight key statements
and change the font colour for specific words/sentences
making your conclusions really stand out